README File: Replication Package for "School Choice, Student Sorting and Academic Performance"

*******************************
Software Version:
The replication code was built using R version 4.2.3.

*******************************
How to replicate:
- open code/replication/main.R
- change path (line 2) to local path where the "package" folder is located
- go to the folder corresponsing to each figure/table to replicate it; use the folder structure shown below for more information

*******************************
Raw data:
The raw administrative data (which should be placed under /data/raw) was publicly-available at the time of the analysis:
- http://static.bacalaureat.edu.ro/ (graduation data)
- http://static.admitere.edu.ro/ (admission data)
- http://titularizare.edu.ro/ (teacher data)
- https://www.e-licitatie.ro/pub (expenditue data)

The school openings data was obtained from observing our panel of high schools and then searching for news on the Internet regarding high school openings in a given town/year.

The school location data was obtained using Google Places API (using school names, school addresses provided in some years on http://static.admitere.edu.ro/ and other Education Ministry Sources which included school lists with town codes and GPS coordinates).

The administrative data is no longer publicly-available. A formal request for these data should be lodged with the Romanian Eduaction Ministry. This data includes student names and school names, which can then be used to merge the admission and graduation records.

Raw data to final data used in paper:
Under /code/raw data to final data, the codes to clean, merge and geocode the data are available. Once the raw data are obtained, simply 
1. run "00_clean_main.R" after changing the local path.
2. run code/raw to final data/Student Teacher Expenditures/main_read_merge_s_t_e.R to create merged student-teacher-expenditure datasets after changing the local path.
3. run code/raw to final data/Student Teacher Expenditures/anonymize_v4.R to anonymize data after changing the local path.

*******************************
Folder Structure and specific replication instructions for each table/figure:
- "code": contains codes
-- "replication": replication codes
-- main.R: loads packages and data

--- 01_table_01: replication fiels for Table 1 in the text
---- 01_table_01_summary_stats.R: code to run for replication
---- summary_stats_v2.xlsx: output for Table 1

--- 02_table_02: replication fiels for Table 2 in the text (and corresponsing Table A.9)
---- 02_table_02.R: code to run for replication
---- 02_table_02.txt: output for Table 2
---- 02_table_02_track_appendix.R: corresponding table A.9 in the Online Appendix
---- 02_table_02_track_appendix.txt: utput for Table A.9

--- 03_figure_01: replication files for Figure 1 in the text
---- 03_figure_01_student_sorting.R: code to run for replication
---- 03_figure_01.pdf: output for Figure 1 (b&w)
---- 03_figure_01_color.pdf: output for Figure 1 (color)

--- 04_figure_02: replication files for Figure 2 in the text
---- 04_figure_02.R: code to run for replication
---- 04_figure_02.pdf: output for Figure 1 (b&w)
---- 04_figure_02_color.pdf: output for Figure 1 (color)
---- 04_figure_02_percentile_appendix.pdf: output for Figure A.8 in the Online Appendix

--- 05_figure_03: replication files for Table 3 in the text and Figure A.9 in the Online Appendix
---- 05_table_03.R: code to run for replication
---- 05_table_03.txt: output for Figure 1 (text)
---- iv_overlap_appendix.pdf: output for Figure A.9 in the Online Appendix

--- 06_figures_03_04_table_04_IV: replication files for Figures 3, 4 and Table 4 (the main instrumental variable results) and robustness checks
---- 06_main.R: code to run for replication
---- 06_figures_03_04_table_04.Rmd: file called by 06_main.R, producing Figures 3, 4 and Table 4
---- 06_regression_and_figure_robustness.Rmd: file called by 06_main.R, produces Table A.23 in the Appendix (sample selection)
---- 06_regression_and_other_outcomes.Rmd: file called by 06_main.R, produces Table A.12 in the Appendix (other outcomes)
---- 06_regression_overidentification.Rmd: file called by 06_main.R, produces Table A.24 in the Appendix (iveridentification)
---- 06_table_04.txt: Table 4 output
---- 06_figure_03.pdf: Figure 3 output
---- 06_figure_04.pdf: Figure 4 output
----- other output: .html output and extra figures corresponding to Online Appendix material

--- 07_table_05: replication files for Table 5
---- 07_table_05.R: code to run for replication
---- 07_table_05.txt: output for Table 5 (text)

--- 08_table_06: replication files for Table 6
---- 08_table_06.R: code to run for replication
---- 08_table_06.txt: output for Table 6 (text)

--- 09_table_07: replication files for Table 7
---- 09_table_07.R: code to run for replication
---- 09_table_07.txt: output for Table 6 (text)

--- 10_table_08: replication files for Table 8 and Table A.15 in the Online Appendix
---- 10_table_08.R: code to run for replication
---- 10_table_08.txt: output for Table 6 (text)
---- 10_table_08_track_appendix.txt: output for Table A.15 in the Online Appendix

--- a - ddd: replication files for Table A.17 (school openings DDD) in the Online Appendix
---- a_ddd_main.R: code to run for replication
---- ddd.Rmd: code called by a_ddd_main.R
---- dddd.html: output

--- a - determinants number high schools: replication files for Table A.10 (determinants of school openings) in the Online Appendix
---- main_determinats_of_hs.R: code to run for replication
---- determinants.Rmd: code called by main_determinats_of_hs.R
---- determinants.html: output for Table A.10

--- a - endogenous markets: replication files for Tables A.19 - A.22 (endogenous markets) in the Online Appendix
---- main_endogenous.R: code to run for replication
---- Endogenous_Markets.Rmd: code called by main_endogenous.R
---- Endogenous_Markets.html: output for Tables A.19 - A.22

--- a - endogenous markets: replication files for Tables A.18 (endogenous markets) in the Online Appendix
---- matching_nearest.R: code to run for replication
---- a_matching.txt: output for Tables A.18


-- "raw to final data": codes used to transform the raw data to the final data - this cannot be run without the raw (restricted data), which is not provided
--- "Codes": codes used to add unique town codes to towns and schools ; all codes in this folder are called by "main_read_mergs_s_t_e.R"
--- "Student Teacher Expenditures": merge student, teacher and spending data
---- "main_read_mergs_s_t_e.R": run this file to merge the data
---- other files: files caleld by "main_read_mergs_s_t_e.R"

- "data": contains data
-- raw: NOT INCLUDED - raw data use for this project, including admissions, graduation, spending and teacher hiring data. Information on how to obtain this restricted data below.
-- intermediate: NOT INCLUDED - intermediate files obtained during cleaning/merge procedure
-- final: final files used in the replication
--- data_student_anon: rds file with matched admissions and graduation records
--- data_teacher_anon: rds file with teacher hiring records
--- data_expenditure_anon: rds file with spending records
--- data_student_teacher_anon: rds file with student records merged to teacher records
--- data_student_expenditure_anon: rds file with student records merged to school spending records
--- data_student_teacher_expenditure_anon: rds file with student records merged to school spending records and teacher hiring records
--- openings_anon: rds file with school openings - this is used in the appendix for the triple difference analysis
--- data_matching_nearest_loose_anon: rds file with students matched using propensity score matching - used for a robustness check in the appendix
--- Population.xlsx: population for different localities in Romania: used to show that the number of schools does not correlate with population changes
--- SIRUTA2-3.xlsx: list of town codes for Romanian towns
 
*******************************
Variable description (variables used for repliaction):
- data_student_anon:
*_ms: middle school-level variable
*_hs: high school-level variable
*_adm: high school admission-level variable
*_bac: high school graduation-level variable
judet_*: county
Cod_SIRUTA_*: town code - used to locate schools
Cod_SIRUTA2_*: higher-level town code - used to locate schools
Cod_SIIIR_*: government-attributed code for each school - used to check if a school is the same across time despite possible name changes
Cod_SIRUES_*: government-attributed code for each public institution - used to check if a school is the same across time despite possible name changes
an: year
liceu_repartizat: high school of admission
school harmonized: high school of admission harmonized over years
town_: town (based on name)
media_la_admitere: admission score (1-10)
media_en_tsu: admission exam score (1-10)
media_de_absolvire: middle school gpa (1-10)
nota_lb_romana: Romanian admission score (1-10)
nota_matematica: Math admission score (1-10)
optiunea_3: elective component admission
nota_optiunea_3: elective score (1-10)
limba_materna: minority language
nota_lb_materna: minority language score (1-10)
specializare_adm: type of hs track
specializare_lb: type of hs track x language of instruction
specializare_bac: type of hs graduation exam (different track types  write different bac exams)
specializare_bac2: aggregated type of hs graduation exam (different track types  write different bac exams)
id_*: student ID
unitate_de_invatamant: high school where registered during the graduation exam
n_students_hs: number of students in high school
dist: distance between middle school and high school
rezultat: outcome of high school graduation exam
school_change: did the student change schools between admission and graduation?
specializare_bac2: track type at high school graduation (aggregated)
media_0: graduation exam score (1-10), coding missing values as 0
media: graduation exam score (1-10)
entrance_perc: admission score (percentile)
grad_perc: graduation score (percentile)
grad_perc_0: graduation score (percentile including 0s)
entrance_perc_ro: admission score (percentile - Romanian)
entrance_perc_math: admission score (percentile - Math)
grad_perc_ro: graduation score (percentile - Romanian)
grad_perc_math: graduation score (percentile including 0s - Math)
n_hs_town: number of high schools in town
class_mean_yr: average admission score in track-year (percentile)
school_mean_yr: average admission score in high school-year (percentile)
class_mean: average admission score in track (percentile)
school_mean: average admission score in high school (percentile)
dec: admission score decile
quart: admission score quartile
med: above-below median admission score
Wages_* : wages (county-level)
drop_*: school ropout rates (county-level)
Unemployment_*: unemployment rates (county-level)

- data_expenditure_anon:
CaloareEUR: Euro value of purchase
unitate_de_invatamant: high school name
judet.bac: county
an: year
Type: purchase type (auction or direct purchase)

- data_teacher_anon:
(additional variables)
County: county
Town: town
Year: year of hiring
Subject.teacher: teacher's taught subject
Grade_Written_perc: written etacher test exam percentile
Year.written: year teacher took exam
Grade_Year: graduation year
Education_num: teacher education
Graduation_Grade: teacher GPA
Teacher_Category: teacher's rank
school_harmonized: high school
town_hs_bac: town
n_hs_town_year: number of high schools in town in year
teacher_perc: teacher percentile score (aggregate)
                              
- data_student_teacher:
(additional variables)
*.ro: variable related to Romanian teachers
*.mand: variable related to mandatory bac subject (e.g. math for science students and history for humanities students)
*.elect: variable related to elective bac subject (e.g. physics for science students and philosophy for humanities students)                             
n_teacher: number of teachers

- data_student_expenditure:
(additional variables)
Expenditure: expenditures in Euro

- data_student_teacher_expenditure_anon:
(additional variables)


